Contents

The R markdown is available from the pulldown menu for Code at the upper-right, choose “Download Rmd”, or download the Rmd from GitHub.


This protocol describes a network analysis workflow in Cytoscape for a set of differentially expressed genes. Points covered:


1 Installation

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")

if(!"RCy3" %in% installed.packages())
  BiocManager::install("RCy3")

library(RCy3)

2 Getting started

First, launch Cytoscape and keep it running whenever using RCy3. Confirm that you have everything installed and running:

    cytoscapePing()
    cytoscapeVersionInfo()

3 Setup

If you haven’t already, install the STRINGapp

installApp('stringApp')
installApp('yfileslayoutalgorithms')

4 Background

Ovarian serous cystadenocarcinoma is a type of epithelial ovarian cancer which accounts for ~90% of all ovarian cancers. The data used in this protocol are from The Cancer Genome Atlas, in which multiple subtypes of serous cystadenocarcinoma were identified and characterized by mRNA expression.

We will focus on the differential gene expression between two subtypes, Mesenchymal and Immunoreactive.

For convenience, the data have already been analyzed and pre-filtered, using log fold change value and adjusted p-value.

5 Network Retrieval

Many public databases and multiple Cytoscape apps allow you to retrieve a network or pathway relevant to your data. For this workflow, we will use the STRING app. Some other options include:

6 Retrieve Networks from STRING

To identify a relevant network, we will query the STRING database in two different ways:

6.1 STRING Protein Query: Up-regulated genes

  • Read the file containing the list of up-regulated genes, TCGA-Ovarian-MesenvsImmuno_UP.txt.
df <- read.table("https://cytoscape.github.io/cytoscape-tutorials/protocols/data/TCGA-Ovarian-MesenvsImmuno_UP.txt")
  • Run this code chunk to run STRING protein query with confidence (score) cutoff of 0.4.
string.cmd = paste('string protein query query="', paste(df$V1, collapse = '\n'), '" cutoff=0.4  species="Homo sapiens"', sep = "")
commandsRun(string.cmd)
  • The resulting network will load automatically.

The resulting network contains up-regulated genes recognized by STRING, and interactions between them with an evidence score of 0.4 or greater.

getTableColumnNames('edge')
evidence.score <- getTableColumns('edge', "stringdb::score")
min(evidence.score)

7 Enrichment Analysis Options

Next, we are going to perform enrichment anlaysis uing the STRING app. Note that there are several other options, including:

8 STRING Enrichment

The STRING app has built-in enrichment analysis functionality, which includes enrichment for GO Process, GO Component, GO Function, InterPro, KEGG Pathways, and PFAM.

The STRING app includes several options for filtering and displaying the enrichment results. The features are all available at the top of the STRING Enrichment tab.

9 STRING Protein Query: Down-regulated genes

Repeat the network search, enrichment analysis and visualization for the set of down-regulated genes:

df <- read.table("https://cytoscape.github.io/cytoscape-tutorials/protocols/data/TCGA-Ovarian-MesenvsImmuno_DOWN.txt")
string.cmd = paste('string protein query query="', paste(df$V1, collapse = '\n'), '" cutoff=0.4  species="Homo sapiens"', sep = "")
commandsRun(string.cmd)

Pro-tip: If you remove the Fill Color mapping from the Style Panel (right-click > Edit > Remove…), set the default to light gray, change the split donut to a Pie Chart in Settings and then try out some of the layouts (see Layout menu), you can end up with network views like this…

10 STRING Disease Query

Now, we will query the STRING disease database to retrieve a network of ovarian cancer associated genes, completely independent of our dataset.

string.cmd = 'string disease query disease="ovarian cancer"'
commandsRun(string.cmd)

This will bring in the top 100 ovarian cancer associated genes connected with a confidence score greater than 0.4. (We did not give them as parameters[100 and 0.4]. This is because the default values are those.)

11 Data integration

Next we will import log fold changes and p-values from our TCGA dataset and use them to create a visualization. Since the network and data use different identifiers, we first have to do some quick identifier mapping. In this case, we will use the gene symbol in the display name column to retrieve Entrez Gene identifiers.

getTableColumnNames('node')
mapped.cols <- mapTableColumn("display name",'Human','HGNC','Entrez Gene')

Here we set Human as species, HGNC as Map from, and Entrez Gene as To.

head(mapped.cols)
tail(getTableColumnNames('node'))
df <- read.csv("https://cytoscape.org/cytoscape-tutorials/protocols/data/TCGA-Ovarian-MesenvsImmuno_data.csv")
head(df)

And integrate the data with the network (node) table in Cytoscape. - Key Column for Network should be Entrez Gene. - Gene should be the key of the data(df).

loadTableData(df, data.key.column = "Gene", table = "node", table.key.column = "Entrez Gene")

You will notice two new columns (logFC and FDR.adjusted.Pvalue) in the Node Table.

tail(getTableColumnNames('node'))

12 Visualization

We can now use the integrated data to create a network visualization.

logFC.table <- getTableColumns('node', "logFC")
logFC.min <- min(logFC.table, na.rm = TRUE)
logFC.max <- max(logFC.table, na.rm = TRUE)
print(logFC.min)
print(logFC.max)
getVisualStyleNames()
setVisualStyle("BioPAX")
setNodeFontSizeDefault(4, style.name = "BioPAX")
setNodeColorMapping("logFC", c(-logratio.max, 0, logratio.max), c('#0000FF', '#FFFFFF', '#FF0000'), style.name = "BioPAX")
setNodeColorDefault('#D3D3D3', style.name = "BioPAX")

Pro-tip: If you apply the yFiles Organic layout, you can end up with network views like this… (yFiles Layout Algorithms App does not support any automation. Please select it in Cytoscape Desktop menubar.)

The TCGA found several genes that were commonly mutated in ovarian cancer, so called “cancer drivers”. We can add information about these genes to the network visualization, by changing the visual style of these nodes. Three of the most important drivers are TP53, BRCA1 and BRCA2. We will add a thicker, clored border for these genes in the network.

selectNodes(c("TP53", "BRCA1", "BRCA2"), by.col = "display name")
setNodeBorderWidthBypass(getSelectedNodes(), 5)
setNodeBorderColorBypass(getSelectedNodes(), '#FF007F')

The network will now look like this:

13 Other Analysis Options

14 Exporting Networks

Cytoscape provides a number of ways to export results and visualizations:

exportImage('./differentially-expressed-genes', 'PDF')
exportImage('./differentially-expressed-genes', 'PNG')
exportImage('./differentially-expressed-genes', 'JPEG')
exportImage('./differentially-expressed-genes', 'SVG')
exportImage('./differentially-expressed-genes', 'PS')
exportNetworkToNDEx("user", "password", TRUE)
exportNetwork('./differentially-expressed-genes', 'cyjs')